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Marivirga tractuosa (Lewin 1 969) Nedashkovskaya ef al. 201 0 is the type species of the genus 
Marivirga, which belongs to the family Flammeovirgaceae . Members of this genus are of in- 
terest because of their gliding motility. The species is of interest because representative strains 
show resistance to several antibiotics, including gentamicin, kanamycin, neomycin, polymix- 
in and streptomycin. This is the first complete genome sequence of a member of the family 
Flammeovirgaceae. Here we describe the features of this organism, together with the com- 
plete genome sequence and annotation. The 4,511,574 bp long chromosome and the 4,916 
bp plasmid with their 3,808 protein-coding and 49 RNA genes are a part of the Genomic En- 
cyclopedia of Bacteria and Archaea project. 



Introduction 



Strain H-43 T (= DSM 4126 = ATCC 23168 = NBRC 
15989) is the type strain of the species Marivirga 
tractuosa. The genus Marivirga, whose type spe- 
cies is M. tractuosa, contains only one additional 
species: M. sericea. The generic name 'Marivirga' 
derives from Latin words 'mare', the sea and 'vir- 
ga', rod, meaning 'a rod that inhabits marine envi- 
ronments' [1]. The species epithet 'tractuosa' is a 
Latin adjective meaning 'that draws to itself, gluey, 
viscous', probably referring to the phenotype of 
gliding motility [1]. Strain H-43 T was isolated in 



1969 from a beach sand sample collected from 
Nhatrang (South China Sea), Vietnam [2] and was 
initially named 'Microscilla tractuosa' by Lewin 
[3], but was never validly published under this 
name. The strain was then in 1974 joined to the 
genus Flexibacter by Leadbetter [4]. In 2010, 
strain H-43 T was reclassified to the novel genus 
Marivirga, based on a polyphasic approach [1]. 
Other strains have been isolated worldwide from 
mud in the Orne Estuary, France and silty sand in 
Penang, Malaysia [5], as well as from brown mud 
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from Muigh Inis, Ireland, underneath frozen sand 
in the upper littoral zone at Auke Bay, Alaska, red- 
brown mud from Helgoland Island, Germany, and 
from brown sand at Moreton Bay, Australia [6]. 
These sampling sites suggest an ecological prefe- 
rence of M. tractuosa for wet terrestrial habitats 
[1,2]. Here we present a summary classification 
and a set of features for M. tractuosa strain H-43 T , 
together with the description of the complete ge- 
nomic sequencing and annotation. 

Classification and features 

The 16S rRNA gene sequence of the strain H-43 T 
shares the highest degree of similarity (99.1%) 
with M. sericea, the only other member of the genus 
Marivirga (Figure 1) [12], and with an uncultured 
Bacteroidetes clone SHBC423 (99%, GQ350249) 
from oceanic dead zones [13]. A representative ge- 
nomic 16S rRNA gene sequence of M. tractuosa was 
compared using NCBI BLAST under default values 
with the most recent release of the Greengenes da- 
tabase [14] and the relative frequencies, weighted 
by BLAST scores, of taxa and keywords (reduced to 
their stem [15]) were determined. The five most 
frequent genera were Flexibacter (= not yet re- 
named Marivirga hits) (26.8%), Pontibacter 



(21.6%), Hymenobacter (21.4%), Adhaeribacter 
(8.3%) and Microscilla (8.0%) (57 hits in total). The 
highest-scoring environmental sequence was 
EU447282 (' Flexibacteraceae bacterium KMM 
6276'), which showed an identity of 100.0% and an 
HSP coverage of 97.6%, but most probably 
represents a Marivirga strain. The five most fre- 
quent keywords within the labels of environmental 
samples which yielded hits were 'microbi' (4.0%), 
'sediment' (3.1%), 'site' (1.9%), 'group' (1.7%) and 
'coral' (1.6%) (192 hits in total). These keywords 
support the ecological preference of M. tractuosa 
for wet habitats, as deduced from the sampling 
sites of the cultivated strains. Environmental sam- 
ples which yielded hits of a higher score than the 
highest scoring species were not found. 

Figure 1 shows the phylogenetic neighborhood of 
M. tractuosa H-43 T in a 16S rRNA based tree. The 
sequences of the two identical 16S rRNA gene cop- 
ies in the genome do not differ from the previous- 
ly published 16S rRNA sequence (AB078072). 

The cells of strain H-43 T are long, slender and flexi- 
ble rods 0.4-0.5 um in diameter and 10-50 |im in 
length or longer (Figure 2). Strain H-43 T is a Gram- 
negative non-spore-forming bacterium (Table 1) 
that exhibits gliding motility [1]. 



Flammeovirga apnea (AB247553) 
innn~ Flammeovirga yaeyamensis (AB247554) 
Flammeovirga arenaria (AB078078) 
76 Flammeovirga kamogawensis (AB251933) 

Perexilibacter aurantiacus (AB276355) 
Limibacter armeniacum (AB359907) 

Thermonema rossianum (Y08956) 

100 Persicobacter psychrovividus (AB260934) 

l Persicobacter diffluens (AB260929) 
100 Flexithrix dorotheae (AB078077) 

Rapidithrix thailandica (AB265192) 
Reichenbachiella agariperforans (AB058919) 
10 0 Marivirga tractuosa (IMG2501762315) 
~ Marivirga sericea (AB078081) 
100i Marinoscillum furvescens (AB078079) 
Marinoscillum pacificum (DQ660388) 
_J00 Roseivirga ehrenbergii (AY608410) 
98 Roseivirga echinicomitans (AY753206) 
Lr-Roseivirga spongicola (DQ080996) 

Fabibacter halotolerans (DQ080995) 

Sphingobacterium spiritivorum (EF090267) 

Figure 1. Phylogenetic tree highlighting the position of M. tractuosa relative to the other type strains within the 
family Flammeovirgaceae. The trees were inferred from 1,408 aligned characters [7,8] of the 16S rRNA gene se- 
quence under the maximum likelihood criterion [9] and rooted in accordance with the family Sphingobacteria- 
ceae. The branches are scaled in terms of the expected number of substitutions per site. Numbers above branches 
are support values from 1,000 bootstrap replicates [10] if larger than 60%. Lineages with type strain genome se- 
quencing projects registered in GOLD [1 1] are shown in blue, published genomes in bold. 
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Strain H-43 T is strictly aerobic and chemoorgano- 
trophic [1]. Growth is observed at 10-40 Q C and with 
0.5-10% NaCl, with optimal growth at 28-32 Q C and 
4-7% NaCl [1]. Colonies are circular, shiny and 2-4 
mm in diameter after 72 h of incubation on marine 
agar [1]. They are usually dark-orange in color but 
whitish or yellow-pigmented variants may occur [1]. 
Pigment type three was found in the strain H-43 T , 
the main pigment being saproxanthin [2]. In n- 
hexane, the absorption maxima of the pigments 
from crude extract were 425 nm, 447 nm, 471 nm 
and 505 nm [2]. Flexirubin-type pigments are not 
produced. Arginine dihydrolase, ornithine decar- 
boxylase, lysine decarboxylase and tryptophan dea- 
minase activities were described to be absent [1], 
however, Srinivas et al. [22] found that strain H-43 T 
could utilize arginine, and also that growth on ala- 
nine and cysteine was weak. Nitrate is not reduced. 
Indole and acetoin (Voges-Proskauer reaction] are 
not produced [1]. Gelatin, Tween 20, Tween 40, 
Tween 80 and DNA are hydrolyzed, as well as agar, 
starch, urea, cellulose (CM-cellulose and filter paper) 
and chitin [1,2], however, again in contrast to the 
original description [1], Srinivas et al. reported that 
the strain does not hydrolyze Tween 20, Tween 40 
or Tween 80 [22]. Acid is not produced from L- 
arabinose, cellobiose, L-fucose, D-galactose, glycerol, 
lactose, melibiose, raffinose, L-rhamnose, L-sorbose, 
sucrose, trehalose, DL-xylose, N-acetylglucosamine, 
citrate, acetate, fumarate, malate, adonitol, dulcitol, 
inositol or mannitol. In the API 50 CH gallery, acid is 
produced only from esculin and arbutin. Production 
of hydrogen sulfide and hydrolysis of casein are va- 
riable [1]. Citrate is utilized but lactose, inositol, glu- 



conate, caprate, phenylalanine and malonate are not 
Utilization of arabinose, D-glucose, D-mannose, su- 
crose, mannitol, N-acetylglucosamine, maltose, adi- 
pate, malate and sorbitol is variable [1]. Glucose, gly- 
cerol, galactose and sucrose (5.1 g/1, each) are used 
as carbon sources and stimulate the growth of strain 
H-43 T , while sodium acetate and sodium lactate do 
not [2]. Nitrogen sources supporting growth include 
tryptone (1 g/1) and casamino acids (1 g/1), but not 
sodium glutamate or NO3" [2]. Alkaline phosphatase, 
esterase (C4), esterase lipase (C8), leucine arylami- 
dase, valine arylamidase, cystine arylamidase, a- 
chymotrypsin, acid phosphatase, naphthol-AS-BI- 
phosphohydrolase, (B-galactosidase and a- and (B- 
glucosidase activities are present, but lipase (C14), 
trypsin, a-galactosidase, (B-glucuronidase, N-acetyl (B- 
glucosaminidase, a-mannosidase and a-fucosidase 
activities are negative in the API ZYM gallery [1]. In 
litmus-milk, the dye was reduced and the clotting 
occurred. Moreover, litmus turned pink due to acidi- 
fication and the curd was re-digested because of 
proteolysis [2]. Strain H-43 T is sensitive to ampicillin 
(10 \ig), benzylpenicillin (10 U), carbenicillin (100 
n_g), chloramphenicol (30 \ig), doxycycline (10 |ig), 
erythromycin (15 \ig), lincomycin (15 \ig), oleando- 
mycin (15 [ig) and tetracycline (30 \ig), but resistant 
to gentamicin (10 \ig), kanamycin (30 \ig), neomycin 
(30 \ig), polymixin (300 U) and streptomycin (30 \ig) 
[1]. Cytochrome oxidase, catalase and alkaline phos- 
phatase tests were positive [1], although Srinivas et 
al. [22] found only a weak reaction in the catalase 
test. When growing the strain was able to degrade 
dihydroxyphenyl alanine and tyrosine (5 g/1) [2]. 




Figure 2. Scanning electron micrograph of M. tractuosa H-43 T 
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Table 1. Classification and general features of M. tractuosa H-43 T according to the MIGS recommendations [16] 



MIGS ID 


Property 


Term 


Evidence code 






Domain Bacteria 


TAS 


[17] 






Phylum Bacteroidetes 


TAS 


[19] 






Class Sphingobacteria 


TAS 


[18] 




Current classification 


Order Sphingobacteriales 


TAS 


[18] 






Family Flammeovirgaceae 


TAS 


[18] 






Genus Marivirga 


TAS 


[1] 






Species Marivirga tractuosa 


TAS 


[1] 






Type strain H-43 


TAS 


[1] 




Gram stain 


negative 


TAS 


[1,2] 




Cell shape 


long, slender and flexible rods 


TAS 


[1] 




Motility 


motile by gliding 


TAS 


[1,2] 




Sporulation 


no 


TAS 


[1,2] 




Temperature range 


10°C-40°C 


TAS 


[1] 




Optimum temperature 


28°C-32°C 


TAS 


[1,2] 




Salinity 


0.5%-10% NaCI 


TAS 


[1] 


MIGS-22 


Oxygen requirement 


strictly aerobic 


TAS 


[1,2] 




Carbon source 


glycerol, glucose, galactose, sucrose 


TAS 


[2] 




Energy metabolism 


chemoorganotroph 


TAS 


[1] 


MIGS-6 


Habitat 


wet terrestrial habitats, occasionally fresh water 


TAS 


[2] 


MIGS-15 


Biotic relationship 


free-living 


NAS 




MIGS-14 


Pathogenicity 


not reported 


NAS 






Biosafety level 


1 


TAS 


[20] 




Isolation 


beach sand sample 


TAS 


[1] 


MIGS-4 


Geographic location 


Nhatrang (South China Sea), Vietnam 


TAS 


[1] 


MIGS-5 


Sample collection time 


1 969 or before 


TAS 


[2] 


MIGS-4. 1 


Latitude 


12.25 






MIGS-4.2 


Longitude 


109.20 


NAS 




MIGS-4.3 


Depth 


not reported 


NAS 




MIGS-4.4 


Altitude 


not reported 


NAS 





Evidence codes - IDA: Inferred from Direct Assay (first time in publication); TAS: Traceable Author Statement (i.e., 
a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the liv- 
ing, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These 
evidence codes are from of the Gene Ontology project [21]. If the evidence code is IDA, then the property was 
directly observed by one of the authors or an expert mentioned in the acknowledgements 



Chemotaxonomy 

The predominant cellular fatty acid of the strain 
H-43 T were iso-C 15 -. 0 (36.8%), zso-Ci 5: i (23.0%) and 
/so-Ci7:03-oh (12.2%), with a detailed listing given 
in Nedashkovskaya et al. [1]. Srinivas et al. re- 
ported fundamentally different observations for 
strain H-43 T , with the Ci 6:0 (69% of the total fatty 
acids) to be the most important fatty acids in the 
strain H-43 T , whereas iso-Cis-.o was not detectable 
[22]. The main respiratory quinone is MK-7 [1]. 



Genome sequencing and annotation 

Genome project history 

This organism was selected for sequencing on the 
basis of its phylogenetic position [23], and is part 
of the Genomic Encyclopedia of Bacteria and Arc- 
haea project [24]. The genome project is depo- 
sited in the Genomes On Line Database [11] and 
the complete genome sequence is deposited in 
GenBank. Sequencing, finishing and annotation 
were performed by the DOE Joint Genome Insti- 
tute (JGI). A summary of the project information is 
shown in Table 2. 
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Table 2. Genome sequencing project information 



MIGS ID 


Property 


Term 


MIGS-31 


Finishing quality 


Finished 


K A 1 / — C ft 

MIGS-28 


i "1 I 

Libraries used 


Three genomic libraries: one 454 pyrosequence standard library, 


one 454 PE library (1 0 kb insert size), one lllumina library 


MIGS-29 


Sequencing platforms 


lllumina GAM, 454 GS FLX Titanium 


MIGS-31. 2 


Sequencing coverage 


60.1 x lllumina; 44.4 x pyrosequence 


MIGS-30 


Assemblers 


Newbler version 2.1 -PreRelease-4-28-2009-gcc-3.4.6-threads, 


Velvet, phrap 


MIGS-32 


Gene calling method 


Prodigal 1.4, GenePRIMP 




INSDC ID 


LrUU2i4y (chromosome) 




PPOO?^0 /nlacmirl FTPAffll^ 




vjcl llJdl 1 K Udlc Ol IXtMcdSc 


UKLti 1 lUci / / ZU I U 




rni n in 


KjCV I JJJ 




NCBI project ID 


37901 




Database: IMG-GEBA 


2503538019 


MIGS-13 


Source material identifier 


DSM 4126 




Project relevance 


Tree of Life, GEBA 



Growth conditions and DNA isolation 

M. tractuosa H-43 T , DSM 4126, was grown in 
DSMZ medium 172 (Cytophaga (marine) medium) 
[25] at 25°C. DNA was isolated from 0.5-1 g of cell 
paste using MasterPure Gram-positive DNA purifi- 
cation kit (Epicentre MGP04100) following the 
standard protocol as recommended by the manu- 
facturer with modification st/DL for cell lysis as 
described in Wu et al. [24]. DNA is available 
through the DNA Bank Network [26,27]. 

Genome sequencing and assembly 

The genome was sequenced using a combination 
of lllumina and 454 sequencing platforms. All 
general aspects of library construction and se- 
quencing can be found at the JGI website [28]. Py- 
rosequencing reads were assembled using the 
Newbler assembler version 2.1-Pre-release-4-28- 
2009-gcc-3.4.6-threads (Roche). The initial Newb- 
ler assembly consisted of 115 contigs in one scaf- 
fold and was converted into a phrap [29] assembly 
by making fake reads from the consensus, collect- 
ing the read pairs in the 454 paired end library, 
lllumina GAii sequencing data (496 Mb) was as- 
sembled with Velvet [30] and the consensus se- 
quences were shredded into 1.5 kb overlapped 
fake reads and assembled together with the 454 
data. The 454 draft assembly was based on 201.9 
Mb 454 draft data and all of the 454 paired end 
data. Newbler parameters are -consed -a 50 -1 350 
-g -m -ml 20. The Phred/Phrap/Consed software 
package [29] was used for sequence assembly and 



quality assessment in the following finishing 
process. After the shotgun stage, reads were as- 
sembled with parallel phrap (High Performance 
Software, LLC). Possible mis-assemblies were cor- 
rected with gapResolution [28], Dupfinisher, or 
sequencing cloned bridging PCR fragments with 
subcloning or transposon bombing (Epicentre 
Biotechnologies, Madison, WI) [31]. Gaps between 
contigs were closed by editing in Consed, by PCR 
and by Bubble PCR primer walks (J.-F.Chang, un- 
published). A total of 336 additional reactions 
were necessary to close gaps and to raise the qual- 
ity of the finished sequence. lllumina reads were 
also used to correct potential base errors and in- 
crease consensus quality using a software Polisher 
developed at JGI [32]. The error rate of the com- 
pleted genome sequence is less than 1 in 100,000. 
Together, the combination of the lllumina and 454 
sequencing platforms provided 104.5 x coverage 
of the genome. Final assembly contains 589,653 
pyrosequence and 7,543,442 lllumina reads. 

Genome annotation 

Genes were identified using Prodigal [33] as part 
of the Oak Ridge National Laboratory genome an- 
notation pipeline, followed by a round of manual 
curation using the JGI GenePRIMP pipeline [34]. 
The predicted CDSs were translated and used to 
search the National Center for Biotechnology In- 
formation (NCBI) nonredundant database, Uni- 
Prot, TIGRFam, Pfam, PRIAM, KEGG, COG, and In- 
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terPro databases. Additional gene prediction anal- 
ysis and functional annotation was performed 
within the Integrated Microbial Genomes - Expert 
Review (IMG-ER) platform [35]. 

Genome properties 

The genome consists of a 4,511,574 bp long chro- 
mosome with a 35.5% G+C content and a 4,916 bp 
plasmid with 40% G+C content (Figure 3 and 



Table 3). Of the 3,857 genes predicted, 3,808 were 
protein-coding genes, and 49 RNAs; Fifty-one 
pseudogenes were identified. The majority of the 
protein-coding genes (62.2%) were assigned with 
a putative function while the remaining ones were 
annotated as hypothetical proteins. The distribu- 
tion of genes into COGs functional categories is 
presented in Table 4. 



4400001 450c,< > 01 100001 
4300001 200001 




2500001 2000001 

2400001 2100001 
2300001 2200001 

Figure 3. Graphical circular map of the chromosome (plasmid map not shown). From outside to the center: 
Genes on forward strand (color by COG categories), Genes on reverse strand (color by COG categories), 
RNA genes (tRNAs green, rRNAs red, other RNAs black), GC content, GC skew. 
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Table 3. Genome Statistics 



Attribute 


Value 


% of Total 


Genome size (bp) 


4,516,490 


100.00% 


DNA coding region (bp) 


4,029,412 


89.22% 


DNA G+C content (bp) 


1,604,111 


35.52% 


Number of replicons 


2 




Extrachromosomal elements 


1 




Total genes 


3,857 


100.00% 


RNA genes 


49 


1.27% 


rRNA operons 


2 




Protein-coding genes 


3,808 


98.73% 


Pseudo genes 


51 


1.32% 


Genes with function prediction 


2,398 


62.17% 


Genes in paralog clusters 


396 


10.27% 


Genes assigned to COGs 


2,375 


61 .58% 


Genes assigned Pfam domains 


2,609 


67.64% 


Genes with signal peptides 


1,113 


28.86% 


Genes with transmembrane helices 


997 


25.85% 


CRISPR repeats 


0 





Table 4. Number of genes associated with the general COG functional categories 



Code 


value 


% age 


Description 


J 


157 


6.1 


Translation, ribosomal structure and biogenesis 


A 


0 


0.0 


RNA processing and modification 


K 


163 


6.3 


Transcription 


L 


131 


5.1 


Replication, recombination and repair 


B 


1 


0.1 


Chromatin structure and dynamics 


D 


30 


1.2 


Cell cycle control, cell division, chromosome partitioning 


Y 


0 


0.0 


Nuclear structure 


V 


63 


2.4 


Defense mechanisms 


T 


184 


7.1 


Signal transduction mechanisms 


M 


236 


9.1 


Cell wall/membrane/envelope biogenesis 


N 


10 


0.4 


Cell motility 


Z 


1 


0.0 


Cytoskeleton 


W 


0 


0.0 


Extracellular structures 


u 


37 


1.4 


Intracellular trafficking and secretion, and vesicular transport 


o 


112 


4.3 


Posttranslational modification, protein turnover, chaperones 


c 


126 


4.9 


Energy production and conversion 


G 


102 


3.9 


Carbohydrate transport and metabolism 


E 


217 


8.4 


Amino acid transport and metabolism 


F 


67 


2.6 


Nucleotide transport and metabolism 


H 


118 


4.6 


Coenzyme transport and metabolism 


I 


99 


3.8 


Lipid transport and metabolism 


P 


136 


5.3 


Inorganic ion transport and metabolism 


Q 


51 


2.0 


Secondary metabolites biosynthesis, transport and catabolism 


R 


340 


13.1 


General function prediction only 


S 


208 


8.0 


Function unknown 




1,482 


38.4 


Not in COGs 
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